The main goals for 8/22-9/8:

  1. Explore synechococcus further:
    • are oligotype/MED results robust? Try subsampling to even depth and rerunning?
    • can we really tell if the variation is just copy number variants? What about intraspecific variation
  2. Explore limnohabitans:
    • it can be a phototroph or heterotroph
    • oligotype it
  3. Explore chloroplasts
    • oligotype it
  4. look at correlations of environmental variables

  5. Work on fixing Edna’s code

Synechococcus

Here is what the old Synechococcus oligotyping results look like:

To check the sensitivity of the Synechococcus oligotyping results, I subsampled the data in mothur to 5,000 reads and made a new fasta file to oligotype. Using this cutoff I was able to normalize the total number of reads per sample (including chloroplasts). I would have to subsample down to 500 or 1000 if I wanted to normalize bacterial reads per sample.

These look identical … hm maybe i did something wrong? But no, the original fasta of synechococcus reads has 325,410 lines (1432 unique) and the subsampled fasta has 323,838 lines (476 unique). So the subsampling definitely worked – it got rid of the rare reads assigned to synechococcus (which are probably filtered out in the MED pipeline)

That talk title you sent me said that Synechococcus has higher diversity in less saline systems and that there is a negative correlation with total N. The inverse correlation with N does not seem to be the case in our system, because if anything Synechococcus blooms more during the earlier part of the season when N is high. Lake Erie is fairly saline for a freshwater lake, but obviously much less so than the Baltic.

Funny aside: if you search lake erie salinity in google one of the top results is this http://www.csmonitor.com/1982/0414/041450.html from Christian Science monitor

Limnohabitans

It’s unclear to me whether there is true ecological variation in Limnohabitans or whether the three Oligotypes just represent a 16s copy variant. The relative proportions of the three to each other don’t change much throughout the season.

In your email you said that limonhabitans is a typical copiotroph, reacting positively to algal bloom, but also to increased input of DOC after heavy rain events.

I made a bunch of scatter plots to look at relationships between limnohabitans abundance and other variables:

## 
## Call:
## lm(formula = log(Abundance) ~ log(Nitrate))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.73474 -0.71543  0.04435  0.61351  2.83915 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.64879    0.26722   2.428   0.0158 *  
## log(Nitrate)  0.30085    0.05004   6.012 5.84e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.074 on 273 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.1169, Adjusted R-squared:  0.1137 
## F-statistic: 36.15 on 1 and 273 DF,  p-value: 5.842e-09

## 
## Call:
## lm(formula = log(Abundance) ~ log(Ammonia))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3768 -0.7432  0.1272  0.7609  2.5015 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.90038    0.09229  20.591  < 2e-16 ***
## log(Ammonia)  0.21272    0.04454   4.775 2.93e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.098 on 273 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.07709,    Adjusted R-squared:  0.07371 
## F-statistic:  22.8 on 1 and 273 DF,  p-value: 2.931e-06

## 
## Call:
## lm(formula = log(Abundance) ~ log(N.P))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5421 -0.7690  0.1790  0.7567  2.5938 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0509     0.3073   9.928  < 2e-16 ***
## log(N.P)     -0.4447     0.1601  -2.778  0.00586 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.13 on 268 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.02798,    Adjusted R-squared:  0.02436 
## F-statistic: 7.716 on 1 and 268 DF,  p-value: 0.005861

## 
## Call:
## lm(formula = log(Abundance) ~ log(SRP))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3035 -0.7993  0.1492  0.7526  2.9558 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.22635    0.07239  30.756   <2e-16 ***
## log(SRP)    -0.05889    0.04003  -1.471    0.142    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.141 on 268 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.008009,   Adjusted R-squared:  0.004307 
## F-statistic: 2.164 on 1 and 268 DF,  p-value: 0.1425

## 
## Call:
## lm(formula = log(Abundance) ~ log(POC))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.69548 -0.72431  0.08871  0.72496  2.49203 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.29778    0.06285  36.558  < 2e-16 ***
## log(POC)    -0.55752    0.06882  -8.101 1.84e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.026 on 273 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.1938, Adjusted R-squared:  0.1908 
## F-statistic: 65.62 on 1 and 273 DF,  p-value: 1.835e-14

## 
## Call:
## lm(formula = log(Abundance) ~ LogChla)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.88762 -0.62302  0.05095  0.75092  2.86467 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.57115    0.15860  22.517   <2e-16 ***
## LogChla     -0.40093    0.04316  -9.289   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9958 on 273 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.2402, Adjusted R-squared:  0.2374 
## F-statistic: 86.29 on 1 and 273 DF,  p-value: < 2.2e-16

## Warning in log(LogPhyco): NaNs produced
## 
## Call:
## lm(formula = log(Abundance) ~ log(LogPhyco))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.90274 -0.77604  0.03967  0.73097  2.72162 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.12768    0.09670  22.003  < 2e-16 ***
## log(LogPhyco) -0.32692    0.07478  -4.372  2.1e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.051 on 176 degrees of freedom
##   (105 observations deleted due to missingness)
## Multiple R-squared:  0.09796,    Adjusted R-squared:  0.09283 
## F-statistic: 19.11 on 1 and 176 DF,  p-value: 2.103e-05

## 
## Call:
## lm(formula = log(Abundance) ~ log(Temp))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7871 -0.7293  0.1697  0.8387  2.5476 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.5358     0.7649   7.238 5.86e-12 ***
## log(Temp)    -1.1025     0.2576  -4.280 2.68e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.082 on 245 degrees of freedom
##   (36 observations deleted due to missingness)
## Multiple R-squared:  0.06957,    Adjusted R-squared:  0.06577 
## F-statistic: 18.32 on 1 and 245 DF,  p-value: 2.681e-05

## 
## Call:
## lm(formula = log(Abundance) ~ LogParMC)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.27348 -0.77399  0.07611  0.82739  2.27695 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.93079    0.08212  23.511  < 2e-16 ***
## LogParMC    -0.25924    0.04211  -6.156 4.99e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.088 on 174 degrees of freedom
##   (107 observations deleted due to missingness)
## Multiple R-squared:  0.1789, Adjusted R-squared:  0.1741 
## F-statistic:  37.9 on 1 and 174 DF,  p-value: 4.995e-09

The best fitting linear models were the ones between the abundance of limnohabitans and either Chlorophyll A or Particulate Microcystin. These relationships were both negative.

There were significant positive relationships between limnohabitans relative abundance and ammonia as well as nitrate.

It seems like Limnohabitans is responding more to environmental conditions, allochthonous carbon than DOC from the bloom. During that October 6th date, when Microcystis seems to disappear completely (large temperature drop, which was probably accompanied by a rain event), limnohabitans shoots up to almost 10% of the full community.

Chloroplasts

Correlations

I didn’t run any correlations, but I read a bunch of papers on multivariate methods

mvabund - fits a separate GLM to each OTU using a common set of explanatory variables, BUTTT most variables we would expect to be unimodal not linear

Jamil et al.2015 Plos One used bayesian framework to link phytoplankton community data, env variables, and traits. They use a gaussian logistic model, with parameters (optimum, tolerance, max) that are linearly dependent on species traits. We could do this without the trait part for the HABs data. We could do it with the traits for Jeff’s data.

Ramette 2007 is a good overview of more familiar multivariate methods like ordination and db-RDA

Conclusions

  1. Synechococcus
  1. Limnohabitans

Limnohabitans has three oligotypes - it’s unclear whether they are really different ecological units, or just a 16s copy variant. Limnohabitans is negatively impacted by Microcystis - it completely disappears during the main phase of the Microcystis bloom. Conversely it responds positively to

  1. Chloroplasts

The chloroplast data is messy because there are hundreds of oligotypes. I pruned them down heavily to just 28 with an M parameter of 5000. A few trends stick out: - different oligotypes in 100um/53um vs 3um - Less diversity in 100um/53um (or is this sampling bias because these samples had lower yields and generally depth) - Overall, chloroplasts are a major contributor to every fraction - Cyan oligotype seems to be mainly present in WE4 - lime green is present mainly at begining and end of season when microcystis is not abundant - Grey oligotype coexists with MC, but appears to be negatively impacted?

  1. Correlations and multivariate analyses see papers

  2. Edna’s stuff: Didn’t want to work on this while Marian was gone. Didn’t have time last week, but will make it a priority over the next week

  3. Other things I did: